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Abstract — Many researchers have hypothesised models which 
explain the evolution of the topology of a target network. The 
framework described in this paper gives the likelihood that the 
target network arose from the hypothesised model. This allows 
rival hypothesised models to be compared for their ability to 
explain the target network. A null model (of random evolution) 
is proposed as a baseline for comparison. The framework also 
considers models made from linear combinations of model 
components. A method is given for the automatic optimisation 
of component weights. The framework is tested on simulated 
networks with known parameters and also on real data. 

I. Introduction 

The field of modelling graph topologies (and in particular 
the topology of the Internet) has generated a huge degree 
of research interest in recent years (see [1. chapter 3] for 
a review of the subject and 1121 for an Internet topology 
perspective). This paper introduces FETA (Framework for 
Evolving Topology Analysis) which can be used to assess 
potential underlying models for any network where infor- 
mation about the network evolution is available. Previously, 
many researchers have fitted probabilistic topology models by 
growing candidate models and assessing how well their model 
fitted against a selection of statistics made on a snapshot of 
the real network. The FETA approach, by contrast, uses a 
single statistic to get a rigorous estimate for the likelihood of a 
model based upon the dynamic evolution of the network. This 
paper concentrates on results from artificial models proving the 
framework reproduces known models. A companion paper fl3] 
reports on results from five real networks but does not present 
the artificial test data given here. 

It has been known for some time that a number of networks 
follow an approximate power law in their degree distribution. 
Such networks include the Internet Autonomous System (AS) 
topology, world wide web, co-authorship networks, sexual 
contact networks, email, networks of actors, networks from 
biology and many others (many references are in [JTJ table 
3.1]). Researchers have attempted to grow artificial versions 
of such networks with models which assign connection prob- 
abilities to existing nodes based upon the graph topology. 
Often surprisingly simple models replicate many features of 
real networks, such as power laws. The celebrated Barabasi- 
Albert (BA) model [4| provides an explanation for these in 



terms of a "preferential attachment" model (the probability of 
connecting to a node is exactly proportional to its degree). 

Further models have given slightly different probabilities 
and slightly different ways of connecting nodes to better 
match the statistics of real graphs lEfl-EI. These models are 
usually assessed by growing artificial networks and measuring 
several representative statistics to compare with the real target 
network. A few models work differently, for example ORBIS 
121 does not "grow" a network by link addition but instead 
"rescales" it. Willinger et al ifTol called for a "closing of 
the loop" with a verification stage which checks how well 
the proposed model fits the target network. FETA addresses 
this validation problem. The FETA procedure evaluates the 
dynamic evolution of a network, not a static snapshot. It di- 
rectly estimates a rigorous likelihood rather than attempting to 
find several summary statistics and this likelihood is estimated 
directly from the network itself rather than by growing and 
measuring an artificial network using the model to be tested. 

II. Evaluation and optimisation framework 

Let G be some graph which evolves in time. Let G t be 
the state of this graph at some step of evolution, t. Consider a 
model for network evolution as consisting of two separate (but 
interconnected) models. The outer model selects the operation 
which transforms the graph between two steps. The inner 
model chooses the entity for that operation. The operation 
and the entity together define the transition from G,_i to Gi. 
Both the outer and inner models may depend on the state of 
the graph Gi on the step of the evolution i and possibly on 
exogenous parameters. Outer model operations might be the 
following: 

1) Add a new node and connect it to an existing node. 

2) Connect the newest node to an existing node. 

3) Connect two existing nodes. 

4) Delete an existing connection. 

5) Delete an existing node and its connections. 

These outer models work with inner models which select either 
nodes or edges for the operation. The inner model assigns 
probabilities to each node (operations 1, 2 and 5) or edge 
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(operations 3 and There may be a different inner model for 
each outer model operation. The outer model might be adapted 
further if the known graph data can include unconnected 
(degree zero) nodes, if graphs can be unconnected and so on. 
The focus of FETA is the inner model and the outer model is 
not discussed here. 

Example 1: The BA model (4) has a simple outer model 
which performs step 1) then step 2) twice (a new node 
connects to exactly three existing nodes). The inner model, 
known as preferential attachment, assigns a probability to each 
node exactly proportional to its degree. This inner model is re- 
ferred to in this paper as 9d- The positive feedback preference 
(PFP) model J8j , uses a parameterised outer model involving 
several connections and an inner model which assigns node 
probabilities where the probability of selecting a node with 
degree d is proportional to d 1+s logl0 ^ where S is a parameter. 

A. Evaluating inner model likelihood 

Let Go be the graph at the first step of evolution observed 
(this need not be right at the start of the evolution of the graph). 
Assume that the state of the graph is observed until some step 
Gt- The graph evolves between step Gj_i and Gi according 
to an outer and inner model. Each step involves the addition 
of one edge. For simplicity of explanation consider the outer 
model to consist only of the two operations: 

1) add a new node and connect it to an existing node Nf, 
or 

2) connect the newest node to an existing node iVj. 

The inner model 9 assigns probabilities to the existing nodes 
at a given step. Given the above outer model, from Gi_i and 
Gi the node 2Vj chosen by the inner model can be inferred. 
Call the set of all observed choices C = (N\, . . . , N t ). 

Definition 1: An inner model 9 is a map which at every 
choice stage j maps a node i to a probability pj(i\9). A model 
9 is a valid model if the sum over all nodes is one ^ pj (i\9) = 
1. 

Theorem 1: Let C — (N\, . . . , N t ) be the observed node 
choices at steps 1, . . . , t of the evolution of the graph G. Let 9 
be some hypothesised inner model which assigns a probability 
Pj(i\9) to node i at step j. The likelihood of the observed C 
given 9 is 

t 

L(C\9) = \[p 3 {N 3 \9). 

j=i 

Proof: If L(Cj\9) is the likelihood of the jth choice given 
model 9 then L(C\9) = ]T* = i L{Cj\9). Given Pj(Nj\9) is the 
probability model 9 assigns to node Nj at step j, therefore it 
is also the likelihood of choice Nj at step j given model 9. 
The theorem follows. ■ 
If two inner models 9 and 9' are hypothesised to explain 
the node choices C arising from observations of a graph 
Go, . . . ,Gt and a given outer model, then the one with the 

'Note that the reason "add a new node" is not considered on its own is to 
confine the study here to connected graphs. 



higher likelihood is to be preferrecQ. In practice, for even 
moderate sized graphs, this likelihood will be beyond the 
computational accuracy of most programming languages and 
the log likelihood 1{C\9) = \og(L(C\9)) is more useful. 

A common statistical measure is the deviance D = 
— 2/(G|f?). (The deviance is usually defined with respect to 
a "saturated model" - in this case the saturated model 9 S is 
the model which has pj(Cj\9 s ) = 1 for all j G 1, ... ,t and 
hence has 1{C\9 S ) — 0. The saturated model 9 S has likelihood 
one but is useless for anything except exactly reproducing 
Go, ■ • ■ , Gt). 

Definition 2: Let f?o be the null model. Here, an appropriate 
null model is the one which assigns equal probability to all 
nodes in the choice set (the random model). The choice set 
is either the set of all nodes or, if a simple graph is desired, 
the set of all nodes to which the new node does not already 
connect. 

The null model allows the assessment of the null deviance 
D = -2(l(C\e)-l(C\6 )). However, both D and D depend 
heavily on the size of t (the number of choices made). A more 
useful measure created for this situation is now given. 

Definition 3: Let 9 be some inner model hypothesis for the 
set of node choices G = (Ni, . . . ,N t ). Let 9a be some rival 
model to compare 9 with. The per choice likelihood ratio with 
9a, ca, is the likelihood ratio normalised by t the number of 
choices. It is given by 





' L{C\9) ' 


l/t 


\l{C\6)-l(C\0 A ) 


CA = 


[l(C\9 a )_ 


= exp 


t 



A value ca > 1 indicates that 9 is a better explanatory 
model for the choice set G than 9a and ca < 1 indicates it is 
worse. Particularly useful is cq the per choice likelihood ratio 
relative to the null model. Note that for a fixed G, given the 
Co statistic for two models 9 and 9a then ca can be shown to 
be the ratio of the former over the latter. 

In summary, the likelihood L(C\9) gives the absolute like- 
lihood of a given model 9 producing the choice set G arising 
from a set of graphs Go, ■ ■ ■ ,Gt- However, the per choice 
likelihood ratio produces a result on a more comprehensible 
scale. 

B. Fitting linear combinations of model components 

An inner model 9 can be constructed from a linear combi- 
nation of other inner models. Let 9\, 6*2, ... be probability 
models. A combined model can now be constructed from 

component models as follows, 9 — fi\9\ + /?2#2 H h PnOn- 

The (3i are known as the component weights. The model 9 is 
a valid model if all /3 G (0, 1) and /3j = 1. The weights (3 
that best explain G can be obtained using a fitting procedure 
from statistics known as Generalised Linear Models (GLM). 

Let Pj(i) = 1 if i = Nj, and Pj(i) = otherwise. 
The problem of finding the best model weights becomes the 
problem of fitting the GLM, P 3 (i) = f3ipj(i\9i)+f3 2 Pj(i\02) + 

2 A model with fewer parameters will sometimes be preferred if the gain 
in likelihood is small or the number of parameters added is large I'll] - the 
extreme case of this is the saturated model 9 S . 
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•••+£. A GLM procedure can fit the j3 parameters to find 
the combined model which best fits the Pj(i). This fit is 
obtained by creating a data point for each choice j and for 
each node i giving information about that node at that choice 
time and also the value of Pj(i). 

GLM fitting in a statistical language such as can be used 
to find the choice of /3, which maximises the likelihood of this 
model. This is equivalent to finding the /3j which gives the 
maximum likelihood for since for model 0, the expectation 
E[Pj(i)] = pj(i\9). The fitting procedure estimates for each 
Pi, the value, the error and the statistical significance. 

Because this procedure requires one line of data for each 
node at each choice then it produces a large amount of data 
and sampling is necessary. As will be seen in section IHI-Bl the 
method still recovers parameters accurately. 

C. FETA in practice 

For simplicity of discussion in previous sections, only 
operations which connected a new node to a single node were 
considered. Using the framework to connect edges between 
existing internal nodes requires a small extension. Since the 
number of potential edges is roughly the square of the number 
of nodes, it makes sense to decompose the choice of an 
edge into the choice of a start node and an end node. Once 
a start node is picked, the choice set for the end node 
can be constrained to ensure the graph remains simple. The 
likelihood of adding edge (x, y) is calculated as the likelihood 
of choosing node x then node y plus the likelihood of choosing 
node y then node x. For the purposes of definition [3] an edge 
counts as two choices (since definition [3] is in terms of node 
choices). 

The outer model could be further generalised by, for ex- 
ample, adding the possibility of a "bare" node appearing (a 
node with no links) if this event could be observed. Another 
extension would be adding node or edge deletion operations. 
Separate inner models can be fitted to different outer model 
operations. For example, in the work on FETA reported in fl3] 
separate models are fitted to the outer model operations which 
connect a single existing node to a new node and the outer 
model operations which connect an edge between existing 
internal nodes. Likelihoods from the two parts of the inner 
model can be directly combined by multiplication. 

Another practical concern is scalability - how the likelihood 
computation time increases as graphs become large. Tests 
were run on a 2.66GHz quad core Xeon CPU using the same 
codebase for two tasks, one to measure the likelihood of a 
target network arising from a given model and the second to 
actually create a network. The number of links created was 
varied from 1,000 to 100,000. While both processes increased 
approximately as 0(n 2 ) where n is the number of links, 
the likelihood calculation is much quicker than the network 
creation process. For 100,000 links the likelihood calculation 
took 53 seconds, the network creation took 2,600 seconds. 
Compared with producing a test network and measuring it, 

2 http://www.r-project.org/ 



the FETA approach is extremely efficient. If the runtime were 
to become onerous, sampling could be used as it is in the 
GLM procedure. This was not necessary for the results in this 
paper. 

It is worth briefly noting two points about data requirements. 
Firstly, FETA does not require data from the entire history of a 
network, the graph Go can be any stage of graph construction. 
Secondly, for a sufficiently large graph, knowing the exact 
order of link arrival should not be necessary (this may occur 
if the graph state is measured periodically rather than recorded 
as every node or edge arrives). A graph with a large number of 
nodes will not change its topology greatly for a small number 
of arrivals and therefore a small reordering of link arrival order 
should make little difference to the model likelihood. Future 
work will seek to quantify the inaccuracies introduced by this 
reordering. 

III. Testing the framework 

The obvious way to test the framework is on simulated data 
sets where the underlying inner model is known. Testing mod- 
els using the likelihood procedure from III- Al is demonstrated 
in section UlI- Al Optimising models using the GLM procedure 
in section IH-BI is done in section IIII-BI A demonstration on 
real data is described in section IIII-CI 

Let d{ be the degree of node i and £j be the triangle count 
(the number of triangles, or 3-cycles, the node is in). The 
model components used in the testing are the following: 9q 
- the null model (random model) assumes all nodes have 
equal probability pi = k n ; 9 c i - the degree model (preferential 
attachment) assumes node probability pi = k^df, 9 t - the 
triangle model assumes node probability pi = ktU; 9s - the 
singleton model assumes node probability p. L — k$ if d. L = 1 
and pi = otherwise; 9r>- the doubleton model assumes node 
probability pi — ko if d% = 2 and pi = otherwise; 9r(ti) - 
the "recent" model where pi = kn if a node was one selected 
in the last n selections and pi = otherwise and 9 { p ] - the 
PFP model assumes node probability pi — k p d\ +SXa&w ^ dt \ 
The k, are all normalising constants to ensure Y2iPi = 1- 

A. Testing the likelihood framework 

The best way to test the likelihood framework is on 
simulated networks with a known underlying inner model. 
Test model one has a simple outer model which creates 
a new node and then connects it to exactly three distinct 
nodes. The inner model 9\ which chooses these nodes is 
6»i = O.50 p (O.O5) + O.50 t . That is, it is 50% the PFP inner 
model with 6 — 0.05 and 50% the triangle model. Naturally, 
nodes with a high number of triangles also have a high degree 
so these model parameters are, to some extent, correlated. 

An artificial network was grown with 10,000 edges us- 
ing the model described above. Assuming that the model 
was known to be of the form f3 p 9 p (S) + (3 t 9 t then, since 
j3 p + Pt = 1 a sweep of the parameters S and /3 t should 
give a likelihood surface with a maximum at the correct 
values of /3 t and S. The values tried were all possible 
combinations of /3 t = (0.1, 0.15, 0.85, 0.9) and S = 
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(0.01, 0.0125,..., 0.0875, 0.09). The likelihood surface pro- 
duced is shown in Figure Q] with contour lines projected below. 
As can be seen, the maximum likelihood is in the correct part 
of the region (J3 t = 0.5, 8 = 0.05). In fact the highest cq was 
with 6 = 0.0525 and (3 t = 0.5, an almost exact recovery of 
the correct parameters. 




Fig. 1. A likelihood surface for the model 9\ with a contour plot beneath. 

Test model two has an outer model which connects a 
new node to either one or two distinct inner nodes (equal 
probability of each). The inner model 6*2 is given by 6*2 — 
0.256*0 + O.250 t + 0.256» s + 0.256» D . Again 10,000 edges were 
generated using this model. A few test models with similar 
parameters to 82 are tested against 82- 



Model 


co 


82 = 0.256*0 - 


-0.256V 


h 0.256i,s- 


h0.256*z3 


2.45188 


0.26» - 


- 0.36 t - 


H 0.256*5- 


-0.256*13 


2.43070 


0.256*0 - 


h0.256* t - 


h 0.36*5- 


h 0.26*!, 


2.43474 


0.26» - 


h0.256* t - 


h 0.36*5- 


h 0.256*z5 


2.43549 


0.246> - 


h0.256* t - 


h 0.266*5 - 


h 0.256*z5 


2.45135 



As can be seen, even the final model which has extremely 
close parameters produces a slightly lower co value. With 
three free parameters in the model, an exhaustive state space 
search could quite time consuming. If the network were bigger, 
or more parameters were required in a test (a real network 
would not have known model components), a brute-force state 
space search would be intractable. For models with many 
parameters the cq parameter could be used as a fitness function 
for an optimisation procedure such as genetic algorithms. 
Alternatively, for linear parameters, the GLM fitting from 
section IH-BI can be used and these tests are performed in the 
next section. 

B. Testing the parameter optimisation 

The next stage is to test the GLM fitting procedure described 
in section IH-Bl on artificial models. This can, in theory, retrieve 
parameters from models produced by linear combinations of 
model components. In this section, statistical significances 
from the GLM procedure are quoted at the 10%, 5%, 1% 
or 0.1% levels. 



First tests were performed on 6*i = 0.56* p (0.05) + 0.56* t 
as described in the previous section. The test network again 
had 10,000 edges. Sampling was used to generate just over 
4,000,000 items of data for the GLM fit. Fitting 6* = 
/3p6* p (0.05) + (3 t 8 t gave the following results. 



Parameter 


Estimate 


Significance 


6> p (0.05) 


0.53 ±0.031 


0.1% 


8t 


0.47 ±0.031 


0.1% 



The parameters were recovered almost exactly. However, 
this assumed that S was known precisely. If 6 is not known 
then the GLM procedure behaves reasonably with incorrect 8. 
The table below shows fits of the model with 6 = 0.2 and 
8 = 0.01 - considerably above and below the correct values. 



Parameter 


Estimate 


Significance 


P (O.2) 


0.12 ±0.022 


0.1% 


0t 


0.84 ±0.021 


0.1% 


0,(0.01) 


0.43 ±0.025 


0.1% 


8 t 


0.57 ±0.025 


0.1% 



In both cases the model correctly gave statistical signifi- 
cance to the 6p component of the model. The actual estimates 
were not 0.5, nor were them expected to be. The true 8 
parameter could be found by trying a range of values within 
the GLM procedure just as it was with the likelihood estimator 
in Figure Q] 

For realistic scenarios, the true underlying model is not 
known. Thus some "misspecified" models (models known to 
be incorrect) were tried to see whether incorrect components 
could be identified. Thus, the model 6* = fiafid + Pt&t + A)6*o 
which includes extraneous 84 (preferential attachment) and 6*o 
(null or random) models. 



Parameter 


Estimate 


Significance 


Pd 


0.46 ±0.057 


0.1% 


ft 


0.57 ±0.031 


0.1% 


/3o 


-0.031 ±0.032 


none 



The 6*o component has been rejected having both a low value 
and a low statistical significance. The 6*<j model has stayed in, 
almost certainly because it has such a strong correspondence 
with the dp (8) model - indeed, for 8 = it is the same model. 

The GLM fitting procedure does not always produce the 
correct answer, in particular, when 84 and 8 P are included in 
the same fitting procedure problems can occur. Fitting 6* = 
8d + 6* p (0.05) + 6*4 gives the following. 



Parameter 


Estimate 


Significance 


Pd 


0.28 ±0.085 


0.1% 


/3 P (0.05) 


0.18 ±0.11 


none 


Pt 


0.54 ±0.038 


0.1% 



Here the GLM procedure gave an incorrect answer. The 
8 P (S) model was incorrectly rejected and given no statistical 
significance. This kind of error is common when 8d and 
8 p (8) are combined in the same model. This model gives 
cq = 5.17 compared with cq — 5.18 for the correct model 
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- the likelihood still identifies the correct model even when 
the GLM procedure fits an incorrect model. 

The GLM procedure was next used to recover parameters 
from 6» 2 = 0.25(9 + 0.2 56) t +0.25(9s+0. 259 D . The test network 
had 10,000 edges as previously. Sampling was used to obtain 
just over 3.5 million data points for model fitting. 



Parameter 


Estimate 


Significance 


Po 


0.23 ±0.021 


0.1% 


Pt 


0.28 ±0.017 


0.1% 


Ps 


0.24 ±0.016 


0.1% 


Pd 


0.25 ±0.020 


0.1% 



As can be seen, this recovery of parameters was quite 
successful, although flt is actually 0.25 and therefore slightly 
outside the error range 0.28 ± 0.017. The next test was to add 
a spurious model component 8d- 



Parameter 


Estimate 


Significance 


Po 


0.33 ±0.059 


0.1% 


Pt 


0.29 ±0.017 


0.1% 


Ps 


0.24 ±0.016 


0.1% 


Pd 


0.23 ±0.022 


0.1% 


Pd 


-0.089 ±0.059 


5% 



The pd parameter was given a negative value (which is 
likely to produce an invalid model for the likelihood estimate) 
and the relatively low statistical significance also suggests 
0d should be removed from the model. An important caveat 
exemplified here is that the GLM model is not constrained to 
produce the f3 parameters in the range (0, 1). This needs to be 
considered when analysing model fitting. 

In most circumstances tested, the GLM model performed 
extremely well. When the correct model was tested, the 
correct results were obtained and spurious model components 
were only accepted if they correlated strongly with genuine 
model components. The GLM model is a very useful tool for 
exploratory data analysis but the likelihood framework remains 
the true test of model fit to data. 

C. Tests on real data 

Tests on five different data sets are reported in [J3] . Here, for 
space reasons, only one network is reported, the Route Views 
AS network, a view of the AS topology collected by the 
University of Oregon Route Views project]. The data set gives 
the growth of the AS topology from 42,000 edges to over 
90,000. Throughout this section, it is important to keep in 
mind the aim of this paper, to test the FETA framework. The 
models described here are not claimed to be the best known 
models for the network in question. The PFP model [8| with 
its special outer model gets a closer match to the final network 
statistics. The ORBIS model |9] does not model evolution but 
is very good at matching statistics on a target network. The 
model presented here as "best" is the best model found using 
the FETA framework with a simple outer model. The claim 
being verified in this section is not that this is the best possible 

4 http://www.routeviews.org 



model of the real network but that models can be assessed and 
optimised using the FETA framework without looking at any 
target statistics other than likelihood. 

Three inner models were compared to the Route Views AS 
network. The outer model was simple - the choice of operation 
(add new node, add link to new node or add inner edge) was 
exactly that sequence observed in the real data. The inner 
model 6*o was used as a base for comparison. The other two 
models were a "pure" PFP model (but without the PFP special 
outer model) f? p (0.005) and the "best" model found which 
was O.810 p (O.O14) + O.170 H (1) (PFP + "recent") to connect 
new nodes and 0.7W d + O.220 fl (l) + O.O70 S (preferential 
attachment + "recent" + singleton) to connect edges between 
existing nodes. The PFP model p (O.OO5) had cq — 4.81 and 
the "best" model had cq — 8.06. From these results PFP and 
"best" should be a significant improvement on random and 
"best" should be better than PFP. These modelling results 
should not be taken as a criticism of PFP as described in 
JS] since the special "interactive growth" outer model of that 
paper was not used (the focus here is on the inner model). 

Each model grew a test network from the seed network of 
42,000 edges. The first point in each plot is after edge 40,000 
and hence shows all models to perform the same (since the 
network is still the seed network at this point). Figures |2] and 
13 show the evolution of various graph statistics for the real 
network compared with the three models. The leftmost point 
for each is within the seed graph and hence should always be 
the same. The statistics are di and d-i the proportion of nodes 
of degree one and two, max d the degree of the highest node, 
d 2 (the mean square node degree), the assortativity coefficient 
r and the clustering coefficient 7. See [2] for full descriptions 
of these statistics. (Note that d is fixed by the outer model and 
is an exact match to the real topology). 

As mentioned at the start of this section, the claim is not 
that these models are a perfect fit to the evolution of the 
target network but, instead, that the order in which they fit 
the target network is that given by the likelihood estimator: the 
"best" model being better than pure PFP, and both being much 
better than random. The models and the Co measures which 
predicted this were produced before any artificial topologies 
were generated and without reference to the graph statistics 
plotted in the figures. This is a convincing demonstration that 
the likelihood measure translates directly into fit to real data 
over a number of statistical measures. 

For most statistics, the ordering seems correct with "best" 
being closest to real, followed by PFP and then random. 
An exception is in the graphs for 7 and r where PFP is 
slightly better than "best". However, in di and maxd the 
PFP model is approximately the same as random, when we 
would expect it to be better. In the case of max<i, random 
predicts unrealistically slow growth. For some statistics, no 
models given are close (for reproducing the statistics of a graph 
snapshot it seems likely that ORBIS, for example, might be 
better). However, the framework has clearly shown its ability 
to assess which model best fits a target graph and this is clearly 
reflected in these statistics. 
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Real + 

Random -X — 

PFP X ■ 

Best □ 




network size (links) 



network size (links) 



network size (links) 



Fig. 2. The evolution of the d\ (left), d,2 (center) and max d parameters. 




Fig. 3. The evolution of assortativity coefficient (left), clustering coefficient (center) and d? (right). 



IV. Conclusions 

The Framework for Evolving Topology Analysis (FETA) is 
a useful toolset for investigating growth models of networks 
where evolution information is available. Network growth 
models were described in terms of an outer model (which 
selected the operation to perform on the graph) and an 
inner model (which selected the entity for the operation). 
A likelihood statistic was given for an inner model giving 
rise to a target network. The likelihood statistic given is 
a rigorous and quick to calculate. It has been shown to 
recover the statistics of a known model from a network grown 
using that model. A method was given for exploring and 
optimising linear combinations of model components and this 
was tested successfully. The fitting procedure can give insight 
into what model components are required to best fit the data. 
Models output by the fitting procedure can then be assessed 
precisely using the likelihood measure. FETA has been tested 
on real data from five networks, one of which was presented 
in this paper. The likelihood measure was found to be a 
good predictor of how well a network grown from a given 
model would match the statistics of the real data. The models 
presented here were not perfect at capturing the evolution of 
the AS graph. Different inner model components would be 
needed to improve this. 

Much more can be achieved with the statistical analysis 
of network growth. A similar likelihood approach could be 
applied to the outer model. Inner models which themselves 
change in time would be another improvement. Models con- 
structed multiplicatively from components (Q^O^ 2 ■ ■ ■) would 
seem natural than but normalisation problems exist. Network 



models could be considered which remove nodes or edges 
as well as add them and which do not necessarily remain 
connected. Finding new data sets to apply the method to is 
also a priority. Other researchers are encouraged to downlo ad 
and try the software and dat£@. 

References 

[1] S. Bomholdt and H. G. Schuster, Eds., Handbook of Graphs and 

Networks. Wiley, 2003. 
[2] H. Haddadi, G. Iannaccone, A. Moore, R. Mortier, and M. Rio, "Network 

topologies: Inference, modelling and generation," IEEE Comm. Surx'eys 

and Tutorials, vol. 10, no. 2, 2008. 
[3] R. G. Clegg, R. Landa, U. Harder, and M. Rio, "Evaluating and optimis- 
ing models of network growth," 2009, http://arxiv.org/abs/0904.0785 
[4] A. L. Barabasi and R. Albert, "Emergence of scaling in random 

networks," Science, vol. 286, no. 5439, pp. 509-512, (1999). 
[5] R Holme, J. Karlin, and S. Forrest, "An integrated model of traffic, 

geography and economy in the internet," SIGCOMM Comput. Commun. 

Rev., vol. 38, no. 3, pp. 5-16, 2008. 
[6] R. Albert and A.-L. Barabasi, "Topology of evolving networks: local 

events and universality," Physical Review Letters, vol. 85, p. 5234, 2000. 
[7] T. Bu and D. Towsley, "On distinguishing between Internet power law 

topology generators," in Proceedings of IEEE INFOCOM, New York, 

NY, Jun. 2002. 

[8] S. Zhou and R. J. Mondragon, "Accurately modeling the Internet 
topology," Phys. Rev. E, vol. 70, no. 066108, pp. 1-7, 2004. 

[9] P. Mahadevan, C. Hubble, D. Krioukov, B. Huffaker, and A. Vahdat, 
"Orbis: Rescaling degree correlations to generate annotated Internet 
topologies," in Proceedings of ACM SIGCOMM, Kyoto, Japan, 2007. 
[10] W. Willinger, R. Govindan, S. Jamin, V. Paxson, and S. Shenker, 
"Scaling phenomena in the Internet: critically examining criticality," in 
Proceedings of the National Academy of Sciences, vol. 99, 2002, pp. 
2573-2580. 

[11] H. Akaike, "A new look at statistical model information," IEEE Trans, 
on Auto. Control, vol. 19, no. 6, pp. 716-723, 1974. 



-http://www.richardclegg.org/software/FETA 



